Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.
translated by 谷歌翻译
Federated learning (FL) enables the building of robust and generalizable AI models by leveraging diverse datasets from multiple collaborators without centralizing the data. We created NVIDIA FLARE as an open-source software development kit (SDK) to make it easier for data scientists to use FL in their research and real-world applications. The SDK includes solutions for state-of-the-art FL algorithms and federated machine learning approaches, which facilitate building workflows for distributed learning across enterprises and enable platform developers to create a secure, privacy-preserving offering for multiparty collaboration utilizing homomorphic encryption or differential privacy. The SDK is a lightweight, flexible, and scalable Python package, and allows researchers to bring their data science workflows implemented in any training libraries (PyTorch, TensorFlow, XGBoost, or even NumPy) and apply them in real-world FL settings. This paper introduces the key design principles of FLARE and illustrates some use cases (e.g., COVID analysis) with customizable FL workflows that implement different privacy-preserving algorithms. Code is available at https://github.com/NVIDIA/NVFlare.
translated by 谷歌翻译
植物点云的分割以获得高精度的形态特征对于植物表型和作物育种至关重要。尽管深度学习方法的绽放促进了对植物点云的分割的大量研究,但大多数作品遵循基于硬素化或基于下采样的方法的共同实践。它们仅限于细分简单的植物器官,忽略了解决具有高空间分辨率的复杂植物点云的困难。在这项研究中,我们提出了一个深度学习网络分割变压器(PST),以实现MLS(移动激光扫描)油料种子强奸点云的语义和实例分割,该强奸点云将其特征在于微小的硅酸盐和致密点作为主要特征。 PST由:(i)一个动态体素特征编码器(DVFE),可通过原始空间分辨率进行每个点特征聚集; (ii)双窗口设置注意力块以捕获上下文信息; (iii)一个密集的特征传播模块,以获得最终的致密点特征图。结果证明,PST和PST-PointGroup(PG)在语义和实例分段任务中实现了最新性能。对于语义细分,PST分别达到93.96%,97.29%,96.52%,96.88%和97.07%的平均值,平均精度,平均召回率,平均F1得分和整体准确性。例如,在MCOV,MWCOV,MPERC90和MREC90中,分割的PST-PG分别达到89.51%,89.85%,88.83%和82.53%。这项研究以端到端的方式扩展了油料强奸的表型,并证明了深度学习方法具有巨大的潜力,可以理解具有复杂形态特征的密集植物点云。
translated by 谷歌翻译
阅读理解是一个复杂的认知过程,涉及许多人类大脑活动。大量作品研究了在信息检索相关方案中阅读理解的模式和注意力分配。但是,关于阅读理解过程中人脑中发生的事情以及这些认知活动如何影响信息检索过程,知之甚少。此外,随着脑成像技术(例如脑电图(EEG))的进步,几乎可以实时收集大脑信号,并探索是否可以用作反馈来促进信息获取性能。在本文中,我们仔细设计了一项基于实验室的用户研究,以调查阅读理解过程中的大脑活动。我们的发现表明,神经反应随着不同类型的阅读内容而变化,即可以满足用户信息需求和无法无法满足的内容的内容。我们建议在阅读理解过程中以微观时间量表以微观时间量表来支持各种认知活动,例如认知负载,语义主题理解和推论处理。从这些发现中,我们说明了一些有关信息检索任务的见解,例如排名模型构建和界面设计。此外,我们建议有可能检测主动现实世界系统的阅读理解状态。为此,我们为基于脑电图的阅读理解建模(UERCM)提出了一个统一的框架。为了验证其有效性,我们基于脑电图特征进行了大量的实验,以进行两项阅读理解任务:回答句子分类和回答提取。结果表明,通过大脑信号提高两个任务的性能是可行的。
translated by 谷歌翻译
Making sense of multiple modalities can yield a more comprehensive description of real-world phenomena. However, learning the co-representation of diverse modalities is still a long-standing endeavor in emerging machine learning applications and research. Previous generative approaches for multimodal input approximate a joint-modality posterior by uni-modality posteriors as product-of-experts (PoE) or mixture-of-experts (MoE). We argue that these approximations lead to a defective bound for the optimization process and loss of semantic connection among modalities. This paper presents a novel variational method on sets called the Set Multimodal VAE (SMVAE) for learning a multimodal latent space while handling the missing modality problem. By modeling the joint-modality posterior distribution directly, the proposed SMVAE learns to exchange information between multiple modalities and compensate for the drawbacks caused by factorization. In public datasets of various domains, the experimental results demonstrate that the proposed method is applicable to order-agnostic cross-modal generation while achieving outstanding performance compared to the state-of-the-art multimodal methods. The source code for our method is available online https://anonymous.4open.science/r/SMVAE-9B3C/.
translated by 谷歌翻译
Given that rich information is hidden behind ubiquitous numbers in text, numerical reasoning over text should be an essential skill of AI systems. To derive precise equations to solve numerical reasoning problems, previous work focused on modeling the structures of equations, and has proposed various structured decoders. Though structure modeling proves to be effective, these structured decoders construct a single equation in a pre-defined autoregressive order, potentially placing an unnecessary restriction on how a model should grasp the reasoning process. Intuitively, humans may have numerous pieces of thoughts popping up in no pre-defined order; thoughts are not limited to the problem at hand, and can even be concerned with other related problems. By comparing diverse thoughts and chaining relevant pieces, humans are less prone to errors. In this paper, we take this inspiration and propose CANTOR, a numerical reasoner that models reasoning steps using a directed acyclic graph where we produce diverse reasoning steps simultaneously without pre-defined decoding dependencies, and compare and chain relevant ones to reach a solution. Extensive experiments demonstrated the effectiveness of CANTOR under both fully-supervised and weakly-supervised settings.
translated by 谷歌翻译
Prompt learning recently become an effective linguistic tool to motivate the PLMs' knowledge on few-shot-setting tasks. However, studies have shown the lack of robustness still exists in prompt learning, since suitable initialization of continuous prompt and expert-first manual prompt are essential in fine-tuning process. What is more, human also utilize their comparative ability to motivate their existing knowledge for distinguishing different examples. Motivated by this, we explore how to use contrastive samples to strengthen prompt learning. In detail, we first propose our model ConsPrompt combining with prompt encoding network, contrastive sampling module, and contrastive scoring module. Subsequently, two sampling strategies, similarity-based and label-based strategies, are introduced to realize differential contrastive learning. The effectiveness of proposed ConsPrompt is demonstrated in five different few-shot learning tasks and shown the similarity-based sampling strategy is more effective than label-based in combining contrastive learning. Our results also exhibits the state-of-the-art performance and robustness in different few-shot settings, which proves that the ConsPrompt could be assumed as a better knowledge probe to motivate PLMs.
translated by 谷歌翻译
Multi-intent detection and slot filling joint models are gaining increasing traction since they are closer to complicated real-world scenarios. However, existing approaches (1) focus on identifying implicit correlations between utterances and one-hot encoded labels in both tasks while ignoring explicit label characteristics; (2) directly incorporate multi-intent information for each token, which could lead to incorrect slot prediction due to the introduction of irrelevant intent. In this paper, we propose a framework termed DGIF, which first leverages the semantic information of labels to give the model additional signals and enriched priors. Then, a multi-grain interactive graph is constructed to model correlations between intents and slots. Specifically, we propose a novel approach to construct the interactive graph based on the injection of label semantics, which can automatically update the graph to better alleviate error propagation. Experimental results show that our framework significantly outperforms existing approaches, obtaining a relative improvement of 13.7% over the previous best model on the MixATIS dataset in overall accuracy.
translated by 谷歌翻译
使用具有固定尺度的图像超分辨率(SR)的深度学习技术,已经取得了巨大的成功。为了提高其现实世界的适用性,还提出了许多模型来恢复具有任意尺度因子的SR图像,包括不对称的图像,其中图像沿水平和垂直方向大小为不同的尺度。尽管大多数模型仅针对单向上升尺度任务进行了优化,同时假设针对低分辨率(LR)输入的预定义的缩小内核,但基于可逆神经网络(INN)的最新模型能够通过优化降低和降低尺度和降低范围的降低准确性来显着提高上升的准确性共同。但是,受创新体系结构的限制,它被限制在固定的整数尺度因素上,并且需要每个量表的一个模型。在不增加模型复杂性的情况下,提出了一个简单有效的可逆重新恢复网络(IARN),以通过在这项工作中仅训练一个模型来实现任意图像重新缩放。使用创新的组件,例如位置感知量表编码和先发制通道拆分,该网络被优化,以将不可固化的重新恢复周期转换为有效的可逆过程。证明它可以在双向任意重新缩放中实现最新的(SOTA)性能,而不会在LR输出中损害感知质量。还可以证明,使用相同的网络体系结构在不对称尺度的测试上表现良好。
translated by 谷歌翻译
医学视觉和语言预训练(MED-VLP)由于适用于从医学图像和文本中提取通用视觉和语言表示的适用性而受到了相当大的关注。大多数现有方法主要包含三个元素:Uni-Modal编码器(即视觉编码器和语言编码器),多模式融合模块以及借口任务,很少有研究考虑医疗领域专家知识的重要性,并明确利用此类此类此类此类此类。知识以促进Med-vlp。尽管在通用域中存在具有知识增强的视觉和语言预训练(VLP)方法,但大多数人都需要现成的工具包(例如,对象检测器和场景图解析器),这些工具包在医疗领域中是不可用的。在本文中,我们提出了一种系统有效的方法,从三个角度通过结构化医学知识来增强MED-VLP。首先,考虑知识可以被视为视觉和语言之间的中间媒介,我们通过知识对齐视觉编码器和语言编码器的表示。其次,我们将知识注入多模式融合模型,以使模型能够使用知识作为补充输入图像和文本进行推理。第三,我们指导该模型通过设计知识引起的借口任务来强调图像和文本中最关键的信息。为了进行全面的评估并促进进一步的研究,我们构建了包括三个任务的医学视觉和语言基准。实验结果说明了我们方法的有效性,在所有下游任务上都实现了最先进的性能。进一步的分析探讨了我们方法的不同组成部分和预训练的各种环境的影响。
translated by 谷歌翻译